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ABSTRACT 

Increasingly  more  sophisticated  weaponry  necessitates  that  U.  S. 
military  organizations  insure  timely  and  responsive  tactical  command 
and  control  systems.   Automation  is  one  obvious  answer  towards 
accomplishing  this  goal.   This  paper  may  be  viewed  as  a  simulation 
study  of  file  organizations  which  are  typical  to  command  and  control 
systems.   It  reports  the  findings  of  a  comparative  analysis  of  five 
different  file  organizations  to  determine  their  responsiveness  to 
five  types  of  commonly  used  application  subroutines.   It  also  uncovers 
areas  for  future  research  with  respect  to  command  and  control  systems' 
file  organizations. 
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I.   INTRODUCTION 

The  potential  enemies  of  the  United  States  are  developing  or 
presently  being  equipped  with  increasingly  more  sophisticated  weapons 
systems.   Thus,  future  warfare  promises  to  be  more  complex  and  faster- 
moving  than  ever  before  realized.   This  threat  necessitates  that  the 
military  organization  of  the  United  States  insure  that  the  efforts  of 
all  combat  arms  be  closely  coordinated" and  interleaved  to  achieve 
maximum  combat  effectiveness.   Automation,  one  obvious  answer  to  this 
goal,  offers  many  promises  for  Improved  tactical  effectiveness  by 
providing  faster  response  times,  powerful  computational  aids,  and  more 
complete  information  conveniently  available  to  enable  decision-makers 
to  better  understand  and  coordinate  the  battlefield  situation. 

Over  the  years  the  Marine  Corps  has  attempted  to  infuse  automation 
into  different  levels  of  command.  To  a  great  extent,  however,  this  has 
been  an  uncoordinated  effort.  In  1964  the  Marine  Corps  initiated  the 
development  of  an  overall  tactical  command  and  control  system  now  known 
as  the  Marine  Tactical  Command  and  Control  System  (MTACCS) .  See  Figure 
1.   This  system  contains  the  following  seven  subsystems  (1) : 

(1)  Tactical  Combat  Operations  System  (TCO) 

(2)  Marine  Air  Command  and  Control  System  (MACCS) 

(3)  Marine  Integrated  Fire  and  Air  Support  System  (MIFASS) 

(4)  Marine  Integrated  Personnel  System  (MIPS) 

(5)  Marine  Integrated  Logistics.  System  (MILOGS) 

(6)  Marine  Air  Ground  Intelligence  System  (MAGIS) 

(7)  Communications  System  (COit>lMS) 


Figure  1. 
Marine  Tactical  Command  and  Control  System  (MTACCS) 


The  Marine  Integrated  Fire  and  Air  Support  System  (MIFASS)  is 
currently  undergoing  a  two  year  development  and  evaluation  at  the  test 
bed  of  the  Marine  Corps  Tactical  Systems  Support  Activity  at  Camp 
Pendleton,  California. 

It  has  become  obvious  in  the  MIFASS  development  that  with 
dynamically  changing  tactical  situations,  variable  unit  deplojmients , 
etc.,  the  degree  of  change  required  of  the  date  base  will  demand  ex- 
treme flexibility  in  the  handling  of  data.   This  type  of  flexibility 
in  the  field  necessitates  a  reprogramming  capability,  but,  in  a  battle- 
field environment  such  a  solution  would  be  unreasonable.   Hence,  some 
form  of  a  generalized  data  management  system  (GDMS)  would  be  required 
in  order  to  free  military  units  from  this  arduous  task. 

Without  the  sophisticated  approach  to  software  changes  afforded  by 
a  GDMS,  all  message  formats  and  user  application  programs  have  to  be 
tied  directly  to  fixed  file  organizations  and  formats.   Each  user 
application  programmer  must  know  precisely  the  location  of  every  data 
field  in  the  records  so  that  this  information  can  be  accessed.   As  a 
result,  different  user  application  programs  must  be  written  to  access 
the  data  fields  in  fixed  file  record  format.   This  conventional  approach 
is  simple  and  straight-forward  as  long  as  the  input  formats,  file  formats 
and  output  formats  never  change.   Inevitably  user  application  program 
requirements  change,  resulting  in  a  series  of  additional  format  changes 
to  ensure  compatability  in  all  processing  and  program  inputs  and  outputs. 
Insuring  this  compatability  is  not  a  trivial  matter  in  that  it  will  be 
costly  in  both  time  and  human  resources. 

In  order  to  avoid  these  types  of  problems,  a  GDMS  can  be  used, 
making  the  maintenance  and  interaction  with  a  data  base  a  relatively 


simple  chore.   Changes  to  files  do  not  affect  application  programs  or 
input  formats.   Conversely,  changes  to  the  input  formats  do  not  require 
reprogramming  or  file  structure  changes.   In  effect,  a  GDMS  causes  the 
data  base  to  be  independent  of  the  user.   This  allows  the  tactical  user 
to  interact  with  the  system  with  simplified  procedures  as  he  creates, 
deletes,  or  modifies  data  and/or  message/display  formats.   By  freeing 
the  tactical  user  from  lengthy  and  complicated  data  handling  procedures, 
he  is  free  to  concentrate  on  his  primary  responsibility,  that  of  re- 
viewing, manipulating  and  reacting  to  the  data  content.   (2) 
To  date,  MIFASS  contains  seven  application  programs: 

(1)  Fire  Mission  Analysis 

(2)  Air  Support  Control 

(3)  Technical  Fire  Control 

(4)  Troop  Safety 

(5)  Target  Data 

(6)  Conflict  Detection 

(7)  Mission  Scheduling  and  Monitoring 

Then  programs  are  interactive.   That  is,  each  of  the  seven  tactical 
application  programs  is  dependent  upon  the  other's  outputs  throughout 
various  stages  of  processing  and  analysis.   For  example,  prior  to  the 
completion  of  the  Fire  Mission  Analysis  application  program.  Troop 
Safety  and  Conflict  Detection  must  interact  in  order  to  provide  indi- 
cations of  unsafe  conditions  to  the  v/eapon  selection  display.   This 
display  is  presented  to  a  fire  support  coordinator  who  must  make  the 
final  decision  as  to  who  will  provide  the  fire  power  on  the  target  to 
be  attacked. 


Data  bases  for  military  tactical  command  and  control  systems  will 
be,  out  of  necessity,  quite  large.   It  has  been  estimated  from  the 
results  of  load  analysis  that  the  memory  requirements  for  MIFASS  alone 
will  be  approximately  120  million  bits.  (3)  The  problem  then  lies  in 
developing  an  effective  file  structure  for  ensuring  responsiveness  and 
efficiency  to  the  demands  placed  upon  it  by  command  and  control  needs. 
For  example,  a  substantial  number  of  large  files  will  be  associated 
with  the  Marine  Tactical  Command  and  Control  System.   One  of  these  files, 
for  example,  is  the  Decision  Logic  Table,  which  contains  over  900  records, 

The  organization  of  a  data  base  can  be  structured  into  any  one  on 
many  configurations;  there  v/ill  be  advantages  and  disadvantages  to  each. 
Quite  naturally,  it  will  be  necessary  to  determine  the  primary  appli- 
cation subroutines  to  be  applied  to  the  data  base  when  accessing  the 
file  structure.  'In  tactical  systems,  for  example,  responsiveness  to 
queries  must  be  considered  paramount  over  other  data  base  design 
criteria  such  as  storage  requirements  or  programming  complexity.   A 
review  of  the  Fire  Mission  Analysis  application  program  reveals  that 
during  its  processing  six  application  subroutine  queries  are  used  to 
access  the  data  base.   By  those  subroutine  queries  different  lists  of 
data  are  extracted  from  the  data  base  upon  which  the  program  can  then 
operate.   For  example,  a  query  is  made  for  the  retrieval  of  the  weapon 
list  from  the  Decision  Logic  Table  where  all  potentially  acceptable 
weapons  systems  are  listed.   Subsequently  a  query  is  made  for  the  re- 
trieval of  the  units  available  with  the  proper  weapons  system.s  from  the 
unit  file.   Thus,  it  can  be  concluded  that  one  of  the  primary  functions 
of  the  application  subroutine  queries  will  be  to  extract  lists  of  data 
from  the  data  base. 


In  a  recent  analysis  of  the  Fire  Mission  Analysis  application 
program's  time  processing  profile,  it  was  revealed  that  43  per  cent 
of  the  total  execution  time  is  consumed  by  searching  the  data  base  in 
response  to  application  subroutine  queries. (A)  This  emphasizes  the 
importance  of  an  efficient  data  base,  one  that  will  minimize  the  length 
of  time  that  must  be  relegated  to  the  searching  and  retrieving  functions, 

The  primary  purpose  of  the  vork   done  in  this  paper  was  to  conduct 
a  comparative  analysis  of  five  different  file  organizations  and  de- 
termine their  responsiveness  to  five  types  of  commonly  used  application 
subroutines  associated  with  tactical  command  and  control  systems.   A 
secondary  purpose  in  the  paper  was  to  conduct  exploratory  research  of 
file  organizations.   This  area  has  been  one  that  has  not  had  adequate 
attention  in  the  past  and  it  was  hoped  to  uncover  areas  for  future 
research. 

The  remaining  .sections  of  this  paper  are  organized  in  the  follow- 
ing manner: 

(1)  Section  II  presents  the  definitions,  file  structures  and 
search  techniques  used  in  the  paper. 

(2)  Section  III  establishes  the  parameters  of  the  application 
subroutines. 

(3)  Section  IV  outlines  both  the  file  structures  and  application 
processes  used. 

(4)  Data  gathered  in  the  file  organization  comparison  is  pre- 
sented in  Section  V. 

(5)  Section  VI  identifies  possible  future  research  in  the  area  of 
file  organization,  specifically  as  it  relates  to  the  ^f^ACCS  test  bed. 

(6)  Section  VII  outlines  the  conclusions  found  in  the  file 

organization  comparison. 
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II.   DATA  BASE  ORGANIZATION 

A.   DEFINITIONS 

In  this  section  a  formal  approach  to  the  several  file  structures 
studied  and  their  concomitant  information  retrieval  schemes  is  pre- 
sented.  Each  of  these  file  structures  is  characterized  and  classified. 
Similiarly,  the  various  methods  of  information  retrieval  utilized  with 
these  files  are  categorized. 

Before  examining  the  structure  of  any  file  or  its  retrieval  schemes, 
it  is  necessary  to  define  or  otherwise  establish  a  common  reference  to 
the  principal  terms  em^ployed  in  this  paper.   Figure  2  provides  a  model 
of  a  typical  generalized  file  structure.   All  examples  included  with  the 
definitions  below  make  use  of  this  figure.   The  tree  in  Figure  3 
describes  the  hierarchy  of  the  file  structure.   The  definitions  set 
forth  below  are  in  accordance  V7ith  those  presented  by  Hsiao  and  Harary.  (5) 

An  ELEMENTARY  DATA  ITEM  E  is  the  smallest  unit  of  information  which 
is  processed.   For  example,  in  Figure  2  each  last  name,  rank  and  pointer 
is  an  elementary  data  item. 

A  RECORD  R  is  an  ordered  collection  of  elementary  data  items.   These 
elementary  data  items  are  the  attributes  V7hich  make  up  the  record. 
Each  attribute  has  a  single  value.   In  Figure  2  the  values  DOE,  CPL, 
0311  and  121  make  up  a  record  for  the  attributes:  last  name,  rank, 
military  occupational  specialty  (MOS)  and  pointer. 

A  KEYWORD  K  is  any  elementary  data  item  within  a  record.   It  is  the 

means  by  which  a  record  is  referenced.   Keywords,  may  be  subscripted  K. 

to  indicate  distinct  values.   In  Figure  2  the  last  nam.es  Doe,  Jones  and 

Smith  are  keywords. 
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Generalized  File  Structure 
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The  ADDRESS  a  of  a  record  is  represented  by  a  positive  integer  and 
indicates  the  location  of  the  record  in  some  type  of  storage  media. 
Each  record  has  a  unique  address.   In  Figure  2  the  unique  addresses  of 
the  respective  records  are  shown  in  the  file  by  002,  007,  Oil,  121 
and  145. 

A  HASH  ADDRESS  f(K.)  is  an  address  derived  by  transforming  a  key- 
word K,  by  a  function  f,  such  that  f(K,)  =  a..   For  further  explanation 
of  this  process  see  page  37 . 

A  record  may  contain  an  elementary  data  item,  called  the  K-POINTER 
of  R.   The  pointer  is  the  address  of  another  record  which  contains  the 
same  keyword.   The  null  pointer  indicates  the  end  of  a  sequence  of 
K-pointer  linked  records.   In  Figure  2  the  elementary  data  items  121 
and  145  are  pointers  and  0  is  the  null  pointer. 

A  K-LIST  is  a  set  of  records  containing  a  common  keyvv'ord.   The 
list  may  contain  only  one  record.-  Also  there  may  be  associated  with 
each  keytvord  several  K-lists.   In  one  K-list  the  K~pointers  only  point 
to  records  within  that  K-list.   As  shown  in  Figure  2,  the  records  at 
addresses  002  and  121  form  a  K-list. 

A  FILE  F  is  a  collection  of  records  with  the  same  elementary  data 
items.   Every  K-list  containing  one  or  more  of  these  records  must  be 
contained  within  the  file.   In  Figure  2  each  record  is  made  up  of  the 
same  four  elementary  data  items:  last  name,  rank,  MOS  and  a  pointer. 
These  records  are  linked  by  means  of  K-pointers  into  K-lists.   Each 
K-list  vrithin  the  file  represents  those  records  containing  a  common  last 
name  keyword . 

A  DIRECTORY  D  for  a  file  is  a  set  of  sequences  of  the  form 

(K. ,h . ,n . ;a ,,  ,a. ^ , . . . ,a .   )  for  i  =  1,2,. ..,m. 
1   1   1   il   i2'     in.  '  '    ' 

1 
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the  elementary  data  items  within  each  sequence  represent  respectively 

K,  ,  the  i-th  keyv7ord;  h.,  the  number  of  records  containing  keyword 

K. ;  n.,  the  number  of  K-lists  for  each  K,  within  the  file;  and  the 
11  1 

beginning  address  a,,,  of  the  j-th  K.  -list. (5)  For  an  example  see  the 
sequences  containing  last  name,  h.,  n,  and  the  addresses  in  the  rectangle 
marked  directory  of  Figure  7. 

A  GENERALIZED  FILE  STRUCTURE  consists  of  two  items,  a  file  F  with 
its  directory  D.   Figure  2  is  an  example  of  a  generalized  file  structure, 
as  are  those  files  studied  in  this  paper. 

B.   FILE  STRUCTURES 

1.  Sequential  Organization 

In  a  sequential  file  structure,  for  every  keyword  K.,  h.  =  n,  = 
1  and  a .  <  a-  <  . . .  <  a  .(5)  For  example,  if  the  last  name  is  chosen  as 
the  keyword  then  records  would  be  stored  contiguously  in  alphabetical 
order  according  to  last  name.   The  records  in  the  file  are  indicated  in 
the  form  of  a  1-1  correspondence  between  them  and  the  directory  se- 
quences, as  there  is  one  keyword  K  for  each  record.   See  Figure  4. 

2.  Multilist  Organization 

In  a  multilist  file  structute  there  exists  one  K-list  per 

keyword,  that  is  every  n.  =  1.(5)  In  this  file  a  record  R  is  a  mem.ber 

of  a  K.-list  whenever  R  contains  the  keyword  K, .   The  directory 

sequences  of  the  multilist  form  a  1-1  correspondence  with  the  K, -lists. 

Only  the  beginning  address  of  the  K.-list  a   ,  occurs  in  the  directory. 

Successive  records  within  the  K.-list  are  obtained  by  means  of  the 

1 

K-pointer  of  R,  with  the  null  pointer  terminating  the  sequence.   See 
Figure  5.   Referring  now  to  Figure  2,  Jones  would  be  the  keyword  by 
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which  all  records  containing  the  last  name  Jones  are  referenced.   The 
directory  would  contain  only  one  occurrence  of  the  keyword  Jones  and 
a  single  address  of  a  record  with  this  keyword.   Subsequent  records  in 
the  Jones  K-list  are  linked  by  means  of  pointers. 

3.  Inverted  Organization 

In  an  inverted  file  structure  each  elementary  data  item  con- 
tained within  the  record  R  is  designated  a  keyword  K  v/ithin  the  direct- 
ory D,  such  that  every  K-list  contains  one  and  only  occurrence  of  R; 
that  is  h.  =  n.  for  all  i. (5)  See  Figure  6.   The  directory  of  an  in- 
verted file  is  usually  quite  large,  because  for  every  keyword  there 
is  an  associated  sequence  of  record  addresses  a...   Thus  by  assigning 
each  keyvTord  the  addresses  of  all  those  records  which  contain  the 
common  attribute  value,  one  need  only  locate  the  addresses  associated 
with  any  K,  in  the  directory  to  find  a  set  of  records  containing  the 
common  reference.   Because  of  this  association,  a  record  address  may 
appear  many  times  throughout  the  directory  in  the  many  K,-a.,  associa- 
tions.  Figure  7  shows  how  Figure  2  would  appear  if  inverted. 

In  a  partially  inverted  file  structure  only  a  subset  of  the 
elementary  data  items  contained  within  the  record  are  selected  as 
keywords.   This  type  of  file  structure  is  often  substituted  for  the 
"fully"  inverted  file  when  known  access  or  retrieval  requirements  are 
based  solely  upon  selected  attributes. 

4,  Random  Organization 

A  random  file  structure  is  a  variation  of  a  generalized  file 
structure  in  that  a  directory  of  keywords  does  not  exist.   Instead  the 
key^^7ords  are  transformed  into  addresses,  these  in  turn  form  a  listing 
which  can  be  thought  of  as  a  directory.   In  this  file  organization  a 
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Random  (Calculation)  Organization 
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record  R  is  stored  and  retrieved  on  the  basis  of  a  predictable  relation- 
ship between  exactly  one  keyword  K  of  a  record  and  the  address  a  of  the 
record.   This  relationship  consists  of  a  process  of  transforming  or 
"hashing"  a  keyword  K  of  a  record  R  into  a  numeric  address.   The  cal- 
culation process  used  in  determining  the  record  keyword  address  re- 
lationship transforms  the  keyword  into  a  numeric  value  by  an  algorithm 
chosen  for  its  effective  strategy  in  optimizing  storage  space.   The 
numeric  result  is  then  divided  by  a  divisor,  which  is  predicated  on  the 
number  of  available  directory  addresses.   The  remainder  after  the 
division  becomes  a  direct  entry  point  to  the  directory.   Since  this 
method  is  non-perfect,  a  "collision"  will  sometimes  occur,  whereby 
different  keywords  map  into  the  same  K,  directory  location.   When  such 
a  collision  occurs,  a  K-pointer  of  R  is  established  from  a  designated 
record  in  the  K,-list  to  the  new  record.   See  Figure  8. 
5.   Ring  Organization 

A  ring  file  structure  consists  of  a  multilist  organization 
with  one  major  difference,  there  is  no  null  pointer  terminating  the 
K-list  sequence.   What  would  normally  be  considered  as  the  null  pointer 
with  respect  to  K,  instead  is  designated  a  K-pointer  of  R  to  the  beginn- 
ing address  of  the  K.-list,  a...   Thus  the  K.-list  of  a  ring  may  be 

1        il  1 

continuously  and  totally  traversed  from  any  record  within.   See  Figure  9, 

C.   SEARCH  TECHNIQUES 
1.   General 

Any  directory  sequence  of  a  file  may  be  defined 

1 
There  is  a  function  f  which  specifies  how  the  beginning  addresses  of 
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K, -lists  in  a.  may  be  traversed. (5)   The  function's  domain  is  the  key- 
word K.  together  with  a  variable  address  x  in  a..   The  range  of  the 
1  1 

function  is  a  single  address  y  in  a,.   The  null  address  of  both  x  and 

1 

y  may  be  specified  by  0.   Thus 


y  =  f(K.,x) 


where 


min  (a. .) ,  x  =  0; 

7=^0.  X  =  max.  (a..); 

min. (a.,:  a..  >  x) .      otherwise. 
J   ij    iJ 

Each  K.  in  the  directory  of  a  generalized  file  structure  is 
associated  with  n.  distinct  beginning  addresses.   The  process  of 
generating  these  beginning  addresses  is  called  decoding  the  keyword  K,. 
In  order  that  a  keyword  K.  be  decoded,  the  function  f  must  be  applied 
first  to  the  initial  value  of  the  variable  address  x  so  as  to  produce 
the  beginning  address  of  the  first  K-list.   Then  successive  applications 
of  the  function  to  the  address  most  recently  determined  produce  each 
subsequent,  higher  K-list  address.   Finally,  when  the  function  produces 
a  null  address,  the  so  called  decoding  process  for  the  keyword  K. 
ceases,  as  all  K. -lists  have  been  determined.   Thus  by  beginning  with 
x  =  0  and  applying  the  function  to  successive  values  of  y  until  the 
null  address  is  reached,  the  decoded  values  are 

f(K.,0)  =  a.^ 

f(K.,a.2)  =  a.2 


f(K  >a     )a. 

1   i>  n-1   in 


f(K.,a.  )  =  0. 
1   m 


24 


referring  to  Figure  7,  the  initial  application  of  the  directory  function 

f(DOE,0)  =  002 
produces  the  address  of  the  first  record  in  the  initial  Doe  K-list,  this 
being  located  at  address  002.   The  next  application  of  the  function 

f  (Doe, 002)  =  121 
produces  the  first  address  of  the  record  beginning  the  second  Doe  K-list, 
With  one  more  application  of  the  function 

f(Doe,121)  =  0 

the  null  address  is  determined  and  the  traversing  process  ceases  for 

the  keyword  Doe  as  all  K-lists  with  this  keyword  have  been  located.   It 

should  be  noted  that  in  this  example  each  Doe  K-list  contains  only  one 

record. 

There  is  also  a  function  g  for  any  file  F  which  specifies  how 

each  element  of  a  K-list  may  be  traversed .  (5)   The  domain  of  g  is  the 

cartesian  product  (K  x  a)  of  the  set  K  of  all  keywords  in  F  with  the 

set  a  of  all  addresses  in  F.   The  range  of  g  is  in  a .   Thus 

y  =  g(K.,x) 

where  y  is  the  K. -pointer  of  the  record  whose  address  is  x.   In  order 
1 

that  a  record  R  be  retrieved,  the  K, -pointer  of  R  must  have  been  pro- 
duced by  the  function  g.   In  other  words,  only  if  its  address  has  been 
used  by  the  function  g  for  the  production  of  a  pointer  can  a  record  be 
considered  to  have  been  retrieved.   Once  again  using  the  example 
illustrated  in  Figure  2,  the  first  application  of  the  traversing 
function  produces  the  address  of  the  first  record  in  the  Doe  K-list, 
002.   Then  by  applying  g  to  this  address  and  the  same  keyword  Doe, 

g(Doe  002)  =  121 
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The  address  of  the  next  record  in  the  K-list  is  retrieved.   Finally, 

g(Doe,121)  =  0 
indicates  the  absence  of  any  more  records  in  the  Doe  K-list. 

2 .  Sequential  and  Multilist  Organizations 

The  search  technique  used  for  the  sequential  and  multilist 
organizations  consists  of  a  keyword  by  key\s?ord  search  through  the  file 
directories  for  a  unique  K. .   The  directory  of  the  sequential  file 
contains  one  key^i^7ord  per  record  in  the  file.   The  multilist  directory 
contains  one  unique  keyword  for  each  set  of  records  containing  that 
particular  keyword.   The  search  actually  consists  of  a  logical  com- 
parison of  each  K.  within  the  directory  (for  i  =  l...m)  with  the 
search  keyword  (K' )  of  the  record  (s)  desired  to  be  retrieved.   When 
K'=  K.  then  the  K.-list  of  all  records  containing  K  as  a  keyword  is 
traversed  and  retrieved.   These  steps  are  accomplished  by  application 
of  the  traversing  function  g: 

g(K.,a. J  =  R. 
In  the  sequential  file  K'  must  be  compared  to  subsequent  keyv;ords  until 
K'  f   K.  (for  i  =  l..,in)  to  en  ure  all  records  containing  K'  are  retrieved, 

3.  Partially  Inverted  Organization 

Two  search  techniques  are  employed  with  the  partially  inverted 
organization,  an  index-sequential  technique  and  a  binary  search 
technique.   The  directory  and  file  contents  are  the  same  for  each 
technique.   The  file  is  partially  inverted  on  three  separate  elementary 
data  items.   In  turn,  the  directory  is  indexed  according  to  each  of 
these  three  items  for  immediate  access  should  that  particular  data 
item  be  chosen  as  the  search  keyvizord  (K'). 


26 


The  index-sequential  search  for  keyword  K.  proceeds  as  described 

above  for  the  sequential  and  multilist  file  organizations  with  the 

additional  capability  of  being  able  to  directly  access  the  subset  of 

directory  keywords  corresponding  to  that  of  K'.   Likewise,  the  K. -lists 

of  all  records  containing  K.  and  concomitant  records  are  traversed 

1 

and  retrieved  respectively  by  appropriate  application  of  the  traversing 

functions. 

In  the  binary  search  for  keyword  K, ,  the  traversing  function  f 

is  not  used,  but  rather,  the  directory  is  sampled  in  the  middle  for 

K'  =  K..   If  K'  >  K.  then  the  first  half  of  the  keywords  in  the  direc- 
11  -^ 

tory  are  eliminated  from  further  comparision;  if  K'  <  K.  then  the  latter 
half  of  the  directory  is  eliminated.   The  remaining  half  of  the  direc- 
tory is  then  sampled  again  and  the  process  repeated  until  K'  =  K.. 

K. -lists  are  traversed  and  records  retrieved  as  described  for  the  index- 

1 

sequential  technique. 

4.  Random  Organization 

The  search  techinque  used  for  the  random  organization  consists 
of  a  two  phase  process.   The  first  phase  consists  of  transforming  the 
search  keyword  K'  into  a  hashing  address,  f(K').   The  second  phase  of 
a  series  of  logical  comparisons  of  the  ordered  elements  in  the  K-list 
associated  with  f(K')  until  the  appropriate  record (s)  are  retrieved. 
See  page  37  for  further  amplification  of  this  process. 

5.  Ring  Organization 

The  search  technique  used  for  the  ring  organization  is  the 
same  as  that  used  for  the  multilist  organization  discussed  on  page  26 . 
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III.   APPLICATION  SUBROUTINES 
A.   GENERAL 

A  tactical  coranand  and  control  system  requires  not  only  the  ability 
to  gather  data  and  retrieve  it  selectively,  but  also  to  make  this  data 
available  to  system  application  programs,  i.e.  Fire  Mission  Analysis  pro- 
gram.  Such  a  system  cannot  be  solely  for  the  storage  and  retrieval  of 
rigidly  formatted  data,  but  rather  it  must  be  capable  of  answering  in- 
formation needs  by  supplying  facts  which  may  depend  upon  complex  interre- 
lationships within  the  data.   The  system  normally  provides  a  rationale 
for  structuring  data  and  a  means  for  managing  and  querying  the  data  base. 
For  purposes  of  this  paper  the  action  of  querying  the  data  base  is  the 
search  and  retrieval  application,  collectively  referred  to  as  application 
subroutines. 

User  programs  and  data  remain  as  independent  resources  to  be  combined 
as  the  need  arises.   The  system  maintains  information  about  the  location 
of  the  data  in  the  file  directories.   It  also  maintains  information  about 
the  input  and  output  requirements  of  the  user's  program  and  has  the  ability 
to  transform  the  existing  data  to  meet  the  requirements  of  the  user's 
program. 

Any  tactical  data  handling  system  must  be  capable  of  v/orking  in  re- 
sponse to  user  commands.   The  user  treats  his  program  requirements  as  a 
set  of  operators.   The  data  base  is  treated  as  a  set  of  operands  to  be 
bound  to  the  operators  by  means  of  various  application  programs,  which 
may  be  either  lengthy  processes  that  consist  of  many  tasks  to  be  ex- 
ecuted over  large  files  of  data  or  simple  functions  that  consist  of  a 
single  operation  on  a  small  unit  of  data. 
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Typical  and  ubiquitous  to  all  system  applications  are  the  require- 
ments for  the  storage  and  retrieval  of  data.   Because  of  their  vital 
necessity  to  all  data  manipulation  processes  and  their  commonality  to 
all  application  programs,  the  five  application  subroutines  listed  below 
were  selected  for  use  in  the  file  structure  analysis  for  this  paper, 

B.  HIGH  ACCESS 

A  high  access  application  subroutine  is  defined  as  a  single  appli- 
cation in  which  60  percent  or  more  of  the  records  contained  within  a 
file  are  accessed  for  the  purpose  of  executing  some  type  of  operation. 
For  high  access  applications  both  the  directory  keywords  and  search 
keywords  are  ordered  alphabetically  or  numerically  as  the  situation 
warrants.   Access  to  a  particular  record  within  the  file  is  accomplished 
by  means  of  the  search  techniques  discussed  in  the  previous  section. 
Elementary  data  item  acquisition  and  user  operations  are  performed  in 
compliance  with  specific  user  program  requirements.   Other  file  opera- 
tions, such  as  additions  and  deletions  of  records,  are  not  evaluated. 
The  number  or  type  of  operations  to  be  performed  on  each  record  are  not 
considered,  as  the  analysis  is  concerned  specifically  with  those  basic 
machine  operations  necessary  only  to  locate  a  particular  record. 

C.  MEDIUM  ACCESS 

A  medium  access  application  subroutine  is  defined  as  a  single  appli- 
cation in  which  less  than  60  percent  and  more  than  30  percent  of  the 
records  contained  within  a  file  are  accessed  for  the  purpose  of  executing 
some  type  of  operation.   The  foregoing  considerations  for  high  access  are 
also  included. 
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D.  LOW  ACCESS 

A  low  access  application  subroutine  is  defined  as  a  single  appli- 
cation in  which  less  than  30  percent  of  the  records  contained  within  a 
file  are  accessed  for  the  purpose  of  executing  some  type  of  operation. 
The  foregoing  considerations  for  high  access  are  also  included. 

E.  SINGLE  KEY  ACCESS 

All  records  contained  within  a  file  having  a  common  keyword  are 
accessed  for  the  purpose  of  performing  some  additional  application. 
There  may  be  only  one  such  record  or  many, 

F.  MULTIPLE  KEY  ACCESS 

A  multiple  key  access  is  defined  as  access  for  all  records  within 
a  file  which  have  two  or  more  keywords  in  common. 
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IV.   EXPERIMENTAL  PROCEDURE 

A.   GENERAL 

1.  Data  Base 

The  data  base  used  for  this  paper  consisted  of  1657  personnel 
type  records.   Included  in  each  record  were  the  individual's  full  name 
and  four  interest  codes,  which  were  represented  by  a  three  digit  number. 
The  size  of  the  data  base  used  in  this  study  can  be  compared  to  any  one 
of  the  many  groupings  of  data  contained  in  the  MTACCS  data  base,  such 
as  the  Decision  Logic  Table.   Therefore,  the  statistics  gathered  are 
representative  of  data  that  might  be  taken  from  an  actual  command  and 
control  system. 

2.  Keywords 

For  purposes  of  the  single  attribute  file  directories,  the  last 
name  elementary  data  item  in  each  record  was  chosen  as  the  keyword.   For 
purposes  of  the  inverted  type  file  directories,  three  elementary  data 
items  in  each  record  were  chosen  as  keywords.   They  were  the  individual's 
last  name  and  two  interest  codes,  interest-1  and  interest-4.   The 
selection  of  these  particular  key\^7ords  was  made  on  the  basis  of  unique- 
ness and  variability.   Last  names  were  the  most  unique  with  957  different 
ones,  while  interest-4  only  had  28  different  values.   Interest-1  had  153. 

3.  Building  Data  Structures 

The  data  Structures  were  defined  in  the  higher  level  computer 
programming  language,  ALGOL.   ALGOL  was  chosen  because  of  its  facility 
with  list  processing,  Vs?hich  is  used  extensively  with  generalized  file 
structures . 
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Upon  completion  of  construction  of  the  five  file  structures 
defined  on  page  15  (sequential,  multilist,  partially  inverted,  random 
and  ring),  a  series  of  application  subroutines  were  executed  on  each 
in  order  to  compare  the  file's  responsiveness  to  the  different  appli- 
cations . 

4.  Data  Sets 

Six  data  sets  of  search  keywords  were  organized.   One  set  was 
arranged  randomly.   The  other  five  were  subsets  of  the  available  file 
and  the  last  name  keywords  were  ordered  alphabetically  and  interest 
codes  numerically. 

5.  Comparing  Structures 

In  order  to  compare  the  responsiveness  of  the  different  file 
structure  organizations  it  was  necessary  to  quantify  the  different  data 
file  basic  machine  operations,  such  as  logical  compare,  add,  multiply, 
and  divide.   No  attempt  was  made  to  determine  the  assembly  language 
instructions,  which  would  actually  be  used  in  the  operation  of  the  IBM 
360/67  for  the  execution  of  a  particular  application.   It  was  not 
considered  necessary  to  perform  this  analysis,  since  ALGOL  was  only  used 
as  a  representative  data  base  language.   Also,  since  every  machine  and 
every  language  will  execute  these  basic  machine  operations  in  a  slightly 
different  manner  it  was  decided  to  keep  the  analysis  at  a  level  that 
would  be  common  regardless  of  the  machine  or  language  used. 

Once  the  records  were  retrieved  during  the  application  subroutine 
runs,  no  additional  operations  were  performed  such  as  adding,  deleting 
or  updating  the  records.   The  purpose  of  this  paper  was  served  by  simply 
determining  the  number  of  various  basic  operations  required  to  retrieve 
a  data  record. 
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6.   Quantifying  Procedure 

The  basic  machine  operation  counting  procedure,  also  known  as 
quantifying  procedure,  consisted  of  a  two  step  process,  the  directory 
search  count  and  the  record  retrieval  count.   The  directory  search  count 
for  all  search  techniques,  except  the  binary  and  random,  consisted  of 
two  logical  comparison  basic  machine  operator  counts  each  time  a  search 
keyword  K'  was  compared  to  a  keyword  K..   The  first  logical  comparison 
count  was  required  to  check  for  the  end  of  file,  the  last  entry  in  the 
directory.   The  second  logical  comparison  count  was  for  each  search  key- 
word comparison  (K*  =  K.)  with  the  keyword  in  the  directory.   The  binary 
directory  search  count  consisted  of  one  add,  divide  and  logical  com- 
parison count  each  and  either  a  subtract  or  another  add  count  for  each 
occurrence  of  sampling  and  halving  the  directory  entries.   The  random 
search  count  consisted  of  a  series  of  logical  comparison,  multiply, 
add  and  divide  counts  for  each  of  the  necessary  steps  required  by  the 
transformation  function  (hashing  algorithm)  used  for  each  search  keyword. 
See  page  37  for  an  explanation  of  this  procedure. 

Once  the  proper  keyword  was  located  by  the  respective  directory 
search  technique  the  record  or  records  associated  with  that  keyword  had 
to  be  retrieved.   The  record  retrieval  count  process  for  all  file  orga- 
nizations except  the  sequential  consisted  of  a  single  logical  comparison 
count  for  each  record  retrieved.   This  count  was  required  to  check  for 
the  end  of  file,  this  being  the  null  pointer  or  K-pointer  to  the  begin- 
ning record  of  the  K.-list  in  the  case  of  the  ring  organization.   The 
sequential  organization  required  no  additional  basic  operator  count 
once  the  desired  keyword  was  located,  as  each  keyword  formed  a  1-1 
correspondence  with  each  record  in  the  file. 
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7 .   Normalizing  Procedure 

Since  the  comparison  dealt  with  different  types  of  basic  machine 
operators  it  was  decided  to  normalize  these  operations  and  base  them  all 
on  the  logical  compare  value.   For  purposes  of  normalization  the  following 
values  were  used: 

Basic  Machine  Operator      Value 

Logical  Compare  1 

Add  1 

Multiply  7 

Divide  10 

The  above  values  were  considered  to  be  representative  of  the  relative 
differences  in  time  of  execution  for  most  all  machines  and  languages. 

B.   FILE  STRUCTURING  PROCEDURES 
1.   Sequential 

The  sequential  file  structure  program  shown  on  page  50  consisted 
of  a  directory  called  K  which  was  an  array  containing  pointers  to 
alphabetically  ordered  keywords,  composed  of  record  last  name  elementary 
data  items,  and  a  file  of  logically  ordered  records.   The  physical  con- 
tiguous aspects  of  sequential  files  were  thereby  simulated,  that  is 

logical  a.  <  logical  a-  <  ...logical  a  . 
L  Z.  m 

The  directory  and  file  construction  process  proceeded  as  follows: 

(1)  As  each  separate  record  was  read  it  was  decomposed  into 
its  elementary  data  items.   A  subroutine  called  ADD  was  then  envoked  to 
add  the  record  to  the  sequential  file  and  the  key\Jord  to  the  directory. 

(2)  Each  new  keyword  was  compared  to  the  other  directory  en- 
tries to  determine  its  logical  position  therein.   By  use  of  a  linked 
list  structure  the  directory  was  ordered. 
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(3)   Once  the  keyword's  appropriate  logical  directory  position 
was  established,  the  record  was  constructed  and  its  address  placed  in 
the  directory. 

(A)   After  all  records  were  read  and  the  file  and  directory 
established,  the  last  name  keywords  in  the  directory  were  alphabetically 
ordered  and  their  addresses  were  then  assigned  to  the  array  K  in  this 
sequence.   This  process  simulated  the  physical  ordering  of  the  file. 

2.  Multilist 

The  multilist  file  structure  program  shown  on  page  54  employed 
a  linked  list  directory  called  KEY  and  a  file  of  records  organized  into 
K-lists.   The  directory  consisted  of  alphabetically  ordered  keywords 
composed  of  a  single  occurrence  of  all  record  last  name  elementary  data 
items,  the  initial  record  address  of  the  K.-list,  and  a  pointer  to  the 
next  keyword  in  the  directory,  K^  ^. 

The  directoiry  and  file  construction  process  proceeded  as  follows; 

(1)  As  each  separate  record  v;as  read  it  was  decomposed  iiito 
its  elementary  data  items.  A  subroutine  called  ADD  was  then  errvoked  to 
add  the  record  to  the  file  and  the  keyword  to  the  directory. 

(2)  Each  new  keyword  was  entered  into  the  directory  in  alpha- 
betical order.   If  the  keyword  was  already  a  member  of  the  directory 
then  no  insertion  was  required  and  the  process  branched  to  step  (3). 

(3)  The  record  was  constructed  with  all  elementary  data  items 
with  the  exception  of  the  last  name.  The  record  address  was  then  linked 
to  the  K.-list  for  that  keyword. 

3.  Partially  Inverted 

The  partially  inverted  file  structure  program  shown  on  page  58 
consisted  of  a  three  section,  nine  array  directory  and  a  file  of  records 
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The  first  section  of  the  directory  consisted  of  keywords,  respectively 
ordered  alphabetically  or  numerically  according  to  last  name  or  interest 
code.   They  were  stored  in  the  keyword  arrays  KN,  Kl  and  K4 ,  which  provided 
the  means  for  indexing  the  directory  according  to  last  name  and  interest 
codes.   The  second  section  consisted  of  three  address  arrays  AN,  Al  and 
A4,  which  contained  the  record  addresses  to  the  single  record  K. -lists 
associated  with  each  keyv7ord.   The  third  section  also  consisted  of  three 
arrays  HN,  HI  and  H4  and  contained  h.,  the  number  of  record  addresses 
associated  with  each  keyword. 

The  directory  and  file  construction  process  proceeded  as  follows: 

(1)  As  each  separate  record  was  read  it  was  decomposed  into  its 
elementary  data  items  and  the  record  constructed  therefrom.   A  subrou- 
tine called  ADD  was  then  envoked  to  add  the  keywords  and  their  concomit- 
ant K.-list  address  and  h.  to  the  directory.   This  .process  involved  three 

1  1 

iterations  of  the  steps  belov^?,  one  iteration  for  each  of  three  keywords 
on  which  the  file  v^;as  inverted. 

(2)  Each  new  keyword  was  categorized  according  to  its  attributes. 
It  was  then  compared  to  the  directory  entries  within  the  indexed  portion 
of  the  directory  corresponding  to  its  attribute  category.   This  was  done 
to  determine  the  keyword's  position  in  the  directory.   Once  determined, 
following  keywords  were  each  relocated  one  array  position  higher  to  make 
the  necessary  room  for  the  new  keyword  being  inserted.   If  the  keyword 
was  already  a  member  of  the  directory  then  no  insertion  was  required  and 
the  process  branched  to  step  (3)  . 

(3)  The  address  of  the  record  v/a's  then  linked  to  the  K.-list 

1 

of  addresses  maintained  by  the  address  arrays  and  h.  was  incremented  by 


one  to  reflect  this  latest  record  address  additi 
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4.   Random 

The  random  file  structure  program  shown  on  page  68  employed 
the  calculation  technique  to  locate  the  keyword.   A  separate  data  area 
was  used  for  the  purpose  of  storing  collision  overflow  records.   See 
page  37.   A  "directory"  consisting  of  an  array  of  pointers  was  establish- 
ed for  each  of  the  three  keywords  selected  (last  name,  interest-1,  and 
interest-4);  each  "directory"  was  named  respectively  HASHl,  HASH2 ,  and 
HASH3.   No  particular  hashing  strategy  was  utilized  to  optimize  the 
storage  area.   Instead  a  very  straight  forward  technique  was  employed. 

The  transformation  function  (hashing  algorithm)  used  to  de- 
termine the  keyt'7ord  address  in  the  array  was  as  follows: 

(1)  Each  alphanumeric  symbol  in  the  keyword  was  matched 
against  a  string  named  ALPH  containing  all  possible  symbols. 

(2)  Upon  determining  a  match,  the  position  of  the  symbol 

in  the  string  ALPH  was  multiplied  by  the  position  of  the  symbol  in  the 
keyword . 

(3)  The  product  of  step  (2)  was  successively  summed  with 
the  preceding  values  derived  from  the  same  ke3rword.   These  values  were 
placed  in  the  variable  named  TOTAL.   For  example,  the  name  Jones  would 
result  in  the  following  operations: 

LTR   STRING  POSITION  WORD  POSIT    RESULT   TOTAL 


J 

10 

0 

15 

N 

14 

E 

5 

S 

19 

1 

10 

10 

2 

30 

40 

3 

42 

82 

4 

20 

102 

5 

95 

197 
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(4)  Upon  the  completion  of  summing  for  all  symbols  in  the  key- 
word the  final  value  was  divided  by  a  positive  number.   This  number 
was  named  HASKDl,  HASHD2 ,  and  HASHD3  for  the  keywords  name,  interest-1 
and  interest-A,  respectively.   The  divisor  was  selected  on  the  basis  of 
the  number  of  storage  locations  allocated  to  contain  the  addresses  for 
each  directory.   In  the  example  above  the  divisor  was  1657,  therefore, 
the  remainder  after  the  division,  the  hash  address,  becomes  197. 

(5)  If  two  or  more  keywords  should  hash  to  the  same  location 

in  the  array  (a  collision) ,  a  separate  data  overflow  area  was  established 
by  the  method  of  chaining. 

In  building  the  file  structure,  each  nev7  record  was  read  and 
the  hash  address  calculated  for  all  three  keyvcords.   The  necessary 
directory  entries  were  made  for  each  of  the  three  keywords  and  their 
associated  addresses,  or  in  the  case  of  a  collision,  the  next  record 
address  in  the  chain  was  established  before  the  next  record  was  read. 
Therefore,  no  redundancy  of  data  existed. 
5.   Ring 

The  ring  file  structure  program  shown  on  page  73  employed  three 
circular  rings  which  were  referred  to  in  the  program  as  NAMERING,  IIRING, 
and  I4RING.   These  rings  served  as  directories.   The  keyx^-ords  in  each 
of  the  three  rings  were  ordered  either  alphabetically  or  numerically  and 
were  formed  in  a  circularly  linked  list.   Thus  each  keyi'/ord  was  linked 
to  the  next  with  the  last  keyword  being  linked  to  the  starting  keyvv'ord. 
Additionally,  the  associated  K-list  for  each  keyword  was  linked  either 
alphabetically  or  numerically  in  circular  form. 
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C.   APPLICATION  SUBROUTINE  PROCESSING  PROCEDURES 
1.   Volume  Access  Application 

The  volume  access  application  subroutine  procedures  in  which 
a  relatively  large  number  of  records  were  retrieved  during  a  single 
application  run  consisted  of  the  high,  medium,  and  low  access  application 
subroutines.   The  data  sets  of  search  key\<7ords  used  for  these  application 
runs  were  sorted  alphabetically  and  numerically  prior  to  processing. 
This  step  was  taken  because  it  depicted  more  realistically  the  manner  in 
which  a  tactical  command  and  control  subsystem  v/ould  actually  accomplish 
such  a  task;  that  is  by  means  of  batch  mode  operations. 

When  processing  the  volume  application  subroutines  it  was  not 
necessary  to  begin  each  keyword  search  of  the  directory  with  the  first 
keyword  of  the  directory  K  .   Instead,  as  each  key\>/ord  was  located  in  the 
directory  its  position  was  noted,  i.e.  K..   The  search  for  the  next  keyword 
began  at  position  K^  ^,  thereby  eliminating  any  requirement  to  search 
again  previously  searched  keywords.   This  sequential  search  technique  v/as 
made  possible  by  the  preordering  of  the  data  sets  of  search  keywords. 

Six  separate  application  subroutine  runs  were  executed  each  with 
the  five  differently  ordered  data  sets.   These  provided  a  sample  size  upon 
which  statistical  conclusions  are  drawn.   The  sequential  and  multilist 
organizations  were  constructed  so  that  a  directory  search  could  only  be 
made  using  the  last  name  keyword.   Because  of  this  any  search  for  an  ele- 
mentary data  item  other  than  last  name  would  require  two  logical  compari- 
sons of  each  of  the  1657  records  in  the  file,  resulting  in  an  exorbitant 
total  of  331A  comparisons.   For  this  reason  no  count  was  made  of  the  basic 
machine  operations  necessary  to  search  for  any  kewords  other  than  the 
last  name. 
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a.  High  Access 

Run-1  was  performed  with  575  alphabetically  ordered  last 
name  search  keywords.   Run-2  used  these  same  last  names  plus  92 
interest-1  and  17  interest-4  numerically  ordered  interest  code  search 
keywords.   Basic  machine  operation  counting  and  normalizing  were  per- 
formed in  accordance  with  the  procedures  defined  on  page  33. 

b.  Medium  Access 

Run-3  used  the  288  last  names,  run-4  included  46  lnterest-1 
and  9  interest-4  codes.   The  alphanumeric  ordering,  the  counting  and 
the  normalizing  procedures  were  identical  to  those  utilized  in  the 
processing  of  the  high  access  category. 

c.  Low  Access 

Run-5  used  the  96  last  names,  run-6  included  16  interest-1 
and  3  interest-4  codes.   The  alphanumeric  ordering,  the  counting  and 
the  normalizing  procedures  v^7ere  identical  to  those  utilized  in  the 
processing  of  the  high  access  category. 

2.   Single  Purpose  Access  Application  Subroutines 

Single  purpose  access  application  subroutines  were  designed 
to  operate  in  a  non-batch  mode  as  a  means  of  selectively  accessing 
certain  particular  items  for  the  accomplishment  of  one  primary  objec- 
tive.  It  should  be  noted  that  a  limit  of  three  search  ke)rwords  was 
imposed  in  this  paper.   Subroutines  were  subdivided  into  two  categories 
single  key  access  and  multiple  key  access.   An  example  of  a  single  key 
access  might  be  a  situation  where  it  was  desired  to  access  all  records 
containing  a  particular  interest  code.   A  multiple  key  access  might 
be  occasioned  by  a  requirement  to  access  all  records  which  contain  two 
or  more  search  key^N^ords. 
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The  single  application  subroutines  were  chosen  because  they 
best  simulated  that  type  of  operation  which  would  occur  most  frequently 
during  the  execution  of  a  tactical  command  and  control  application 
subroutine.   An  example  of  this  would  be  the  situation  were  for  given 
attributes  of  the  target,  the  data  base  is  searched  for  the  possible 
weapon  choices  available  for  effectively  attacking  the  target. 

Four  separate  application  subroutine  runs  were  executed.   These 
runs  provided  a  sample  upon  which  satistical  conclusions  are  drawn. 
Since  the  sequential  and  multilist  organizations  were  limited  to  direc- 
tory searches  involving  only^the  last  name,  two  runs  were  executed 
with  these  organizations. 

a.  Single  Key  Access 

The  single  key  access  category  of  single  purpose  access 
application  subroutines  used  only  one  search  keyword.   Run-7  used  as 
the  search  keyword,  the  last  name.   Run-8  used  as  search  keywords  a 
mixture  of  last  name  and  interest-1  and  interest-4  codes.   In  both 
runs  a  sample  size  of  800  was  used.   The  quantifying  and  normalizing 
procedures  discussed  on  page  33  were  used  and  the  results  are  pre- 
sented in  Figure  10. 

b.  Multiple  Key  Access 

The  multiple  key  access  category  of  single  purpose  access 
application  subroutines  utilized  two  keywords  with  which  to  search. 
Run-9  searched  for  all  records  containing  a  certain  last  name  and 
interest  code.   The  run  included  100  samples  of  this  operation.   Each 
record  contained  all  elementary  data  items,  so  as  each  record  was  located, 
based  on  the  first  search  keyword,  it  was  then  checked  for  the  presence 
of  the  second  keyword.   Thus  the  search  v;as  identical  to  the  single 
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key  access,  except  for  an  additional  logical  comparison  which  was 
necessary  to  check  for  the  presence  of  the  second  keyword. 

Run-10  searched  for  all  records  containing  a  certain  last 
name  or  interest  code.   In  this  run  it  was  necessary  to  fully  search 
out  both  search  keywords.   One  hundred  samples  of  this  operation  were 
included  in  the  run. 

Both  runs  used  the  quantifying  and  normalizing 
procedure  discussed  on  page  33.   The  results  of  these  runs  are  pre- 
sented in  Figure  10. 
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V.   PRESENTATION  OF  DATA 

With  each  combination  of  run  and  file  type  in  the  volume  access 
application  subroutines  five  data  sets  were  executed,  thus  150  programs 
of  this  type  were  run  gathering  data.   From  each  of  these  groups  of 
five  a  mean  was  computed  and  then  divided  by  the  number  of  keywords 
searched  for  in  the  directory.   In  the  single  purpose  access  appli- 
cation subroutines  20  programs  were  run  gathering  data  and  the  results 
were  divided  by  the  number  of  kejwords  that  were  searched  for  in  the 
directory.  ' 

The  results  of  these  computations  yielded  the  average  number  of 
basic  machine  operations  required  per  keyword  search.  These  values 
are  presented  in  Figure  10.' 
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N.   FILE 

Nv  ORGAN 

APPLICATION  \^ 

SEQ 

ML 

PARTIALLY 

INVERTED 

RANDOM 

RING 

SUBROUTINE     \^ 

SEQ 

BIN 

RUN-1 

4 

4 

10 

140 

67 

7 

HIGH 

V   RUN-2 

X 

X 

14 

136 

76 

9 

0   RUN- 3 

5 

7 

10 

140 

67 

7 

L  MEDIUM 

U   RUN-4 

X 

X 

14 

135 

74 

9 

M   RUN-5 

6 

.  9 

13 

140 

66 

9 

E  LOW 

RUN-6 

X 

X 

17 

135 

74 

11 

S    RUN-7 

1997 

805 

1207 

139 

67 

805 

I   SINGLE  KEY 

N   RUN-8 

X 

X 

1174 

129 

77 

782 

G   RUN-9 

1596 

958 

1192 

136 

72 

958 

L  MULTIPLE  KEY 

E   RUN- 10 

X 

X 



1692 

252 

170 

1116 

Figure  10. 
Number  of  Basic  Machine  Operations  Required  Per  Keyword  Search 
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VI,   USE  OF  RESULTS  IN  DATA  STRUCTURE  DESIGN 

The  primary  purpose  of  the  Marine  Corps  MTACCS  test  bed  located 
at  a  Camp  Pendleton  is  to  develop  the  specific  operational  requirements 
for  the  MTACCS  subsystems.   The  test  bed  system  hardware  and  softv;are 
consist  of  off-the-shelf  commericially  available  wares.   Consequently 
the  result  is  a  rather  slow  and  inefficient  system.   The  Marine  Corps 
is  already  considering  the  purchase  of  a  faster  computer  for  the  test 
bed.   This  is  due  to  the  fact  that  response  times  for  application 
programs  have  been  poor.   In  addition  to  operating  system  perculiarities, 
file  organization  is  a  major  cause  of  this  system's  slowness.   However, 
as  yet  no  major  effort  has  been  made  by  test  bed  personnel  to  optimize 
software  aspects  of  the  future  MTACCS  system,  such  as  file  organization. 

In  the  process  of  analyzing  the  difference  between  the  file  or- 
ganizations studied  in  this  paper  it  became  obvious  that  while  file 
organization  can  make  major  differences  in  searching  efficiency,  the 
organization  of  the  individual  record  elementary  data  items  can  be 
even  more  fundamental  to  overall  system  searching  efficiency. 

It  is  blatantly  obvious  that  minimizing  the  number  of  fields  that 
must  be  searched  as  keywords  improves  the  effectiveness  to  search  a 
file.   An  example  of  the  manner  in  which  an  inefficiency  in  the  design- 
ing of  records  impedes  file  searching  is  the  Decision  Logic  Table 
incorporated  within  the  MIFASS  data  base.   This  particular  data  struc- 
ture is  searched  by  accomplishing  an  index-sequential  search  on  one 
keyword.   \"/hcn  all  associated  records  have  been  extracted,  five  additional 
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sequential  searches  are  made  in  which  any  record  that  does  not  contain 
the  remaining  keywords  is  eliminated.   This  process  requires  many 
logical  comparisons  and  is  inherently  slow. 

Files  should  be  scrutinized  for  possible  ways  to  optimize  their 
use.   For  example,  analysis  of  the  decision  logic  table  reveals  that 
it  can  be  organized  into  two  separate,  but  related,  portions.   They 
could  be  referred  to  as  the  search  portion  and  the  weapons  selection 
portion.   The  search  portion  consists  of  six  elementary  data  items, 
all  of  which  could  be  represented  by  a  relatively  small  number  of  bits, 
in  that  each  item  has  only  to  represent  a  few  values.   The  below  list 
presents  the  requirements  of  these  elementary  data  items: 

ELEMENTARY  DATA  ITEM      POSSIBLE  VALUES     BITS  REQUIRED 
Target  Type  14  A 

Target  Sub-Type  74  7 

Target  Degree  of  Protection'     10  4 

Proximity  of  Friendly  Troops     2  1 

Anti-Air  Artillery  Protected     2  1 

Target  Mobility  3  2 

Total  19 
Thus  we  have  shovm  that  with  the  use  of  only  19  bits  (less  then  one 
word)  the  portion  of  the  record  requiring  search  could  be  encoded.   This 
would  allow  one  logical  comparison  per  record.   Instead  of  six,  in  order 
to  locate  the  desired  record.   Additionally,  the  records  could  be  placed 
in  a  random  organization  (hash  coded)  thereby  expediting  the  search  to 
an  even  greater  degree. 
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The  weapons  selection  portion  provides  a  list  of  weapons  that  will 
be  effective  against  a  particular  type  target.   More  substantial  in 
length,  this  section  includes  such  information  as  the  preferred  weapon, 
ordnance,  fuse,  weapon  category,  probability  of  kill,  weapon  CEP,  and 
weapon  radius  of  effectiveness.   It  appears  that  all  entries  are  re- 
peated numerous  times  throughout  the  file  thus  indicating  the  desire- 
ability  of  eliminating  their  redundancy  and,  consequently,  reducing  the 
storage  required.   A  list  of  pointers  may  be  used  to  link  the  search 
portion  of  the  record  to  the  weapons  selection  protion.   This  would 
appear  to  simplify  the  handling  of  this  cumbersome  portion  of  the  file. 

This  is  presented  as  an  example  of  the  type  analysis  that  might 
be  performed  on  files  towards  the  goal  of  optimizing  these  structures 
for  the  application  subroutines  to  be  used.   Future  thesis  work  could 
possibly  be  accomplished  in  the  following  areas: 

(1)  Analyzing  specific  MTACCS  files 

(2)  Determining  feasibility  of  file  reorganization 

(3)  Developing  new  techniques  for  compacting  files 

(4)  Developing  necessary  macro  instructions  for  bit 
string  processing. 

It  should  be  emphasized  that  prior  to  proceeding  with  research  in  this 
area  the  student  should  spend  several  days  at  the  Camp  Pendleton  test 
bed  to  familiarize  himself  with  the  MTACCS  system  in  use. 
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VII.   CONCLUSIONS 

Before  the  file  organization  is  determined  the  organization  and 
structure  of  the  elementary  data  items  in  the  records  of  the  file  must 
be  analyzed  in  order  to  optimize  the  searching  efficiency  of  the  file. 

The  search  technique  of  the  sequential  file  organization  was  found 
to  be  superior  per  record  relationship  to  the  other  file  organizations 
when  the  three  volume  access  application  subroutines  were  applied  to 
the  data  base.   The  multilist,  partially  inverted  and  ring  organizations, 
all  of  which  lend  themselves  to  sequential  searching,  functioned  in  a 
semi- efficient  manner  because  it  was  possible  to  search  for  all  search 
keywords  with  only  one  search  pass  through  the  directory.   However,  the 
partially  inverted  file  with  binary  search  technique  as  well  as  the 
random  file  organization  responded  with  an  unsatisfactory  performance 
when  subjected  to  the  volume  access  application  subroutines.   This 
was  due  to  the  fact  that  the  directory  search  techniques  employed 
could  not  take  advantage  of  the  ordered  search  ke3rwords  and  proceed 
sequentially  through  the  file,  but  rather  a  complete  application  of 
the  search  technique  cycle  was  required  for  each  search  keyword  lookup. 

When  the  search  techniques  using  single  purpose  access  application 
subroutines  were  applied  to  the  data  base  the  random  organization  was 
found  to  be  superior  to  the  other  file  organizations.   This  organization 
was  an  order  of  magnitude  better  than  any  of  the  others,  except  for  the 
partially  inverted  file  with  the  binary  search  technique.   This  is  the 
organization  that  would  appear  most  suitable  among  the  ones  considered 


48 


for  a  file  like  the  Decision  Logic  Table  where  very  fev;  table  updates 
are  required  and  the  primary  processing  will  be  queries.   Interestingly, 
the  sequential  file  organization,  while  the  best  for  volume  accesses 
was  the  least  effective  for  single  purpose  accesses. 
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